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Abstract 

There is strong expectation that cities, across time, culture and level of development, share 
^ much in common in terms of their form and function. Recently, attempts to formalize 

mathematically these expectations have led to the hypothesis of urban scaling, namely 
that certain properties of all cities change, on average, with their size in predictable 
scale-invariant ways. The emergence of these scaling relations depends on a few general 
properties of cities as social networks, co-located in space and time, that conceivably 
apply to a wide range of human settlements. Here, we discuss the present evidence 

Q\ for the hypothesis of urban scaling, some of the methodological issues dealing with proxy 

measurements and units of analysis and place these findings in the context of other theories 
of cities and urban systems. We show that a large body of evidence about the scaling 

O properties of cities indicates, in analogy to other complex systems, that they cannot be 

treated as extensive systems and discuss the consequences of these results for an emerging 
j> statistical theory of cities. 
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Introduction 

There is a general recognition, common to many disciplines, that cities regardless of their 
size, geography, time or culture share many underlying organizational, social and eco- 
nomic characteristics, and play similar functional roles in different human societies [Tfj4] . 
A citizen of New York City will quickly understand how Tokyo works. She will also find 
a small town anywhere straightforward to navigate, if a little uneventful. When Cortes's 
men arrived in Tenochtitlan in 1519 (today's Mexico City) Bernal Diaz del Castillo fa- 
mously described the city as spectacular for its scale (~200,000 people, one of the largest 
cities of the time) and wealth [5] . But perhaps the true surprise should have been - given 
its independent development from old-world cites - how familiar it all was, in terms of its 
roads and canals, its public buildings and neighborhood organization and its markets and 
social life (5). The same could have been said of countless accounts of travelers, historians 
and anthropologists. There is a sense in which human settlements of ancient Mesopotamia 
and of modern developed nations share enough in common that the term "cities" can be 
used to meaningfully refer to entities separated by thousands of years of cultural, social 
and technological development [IJ[2j[6]. All of this suggests that the functional role of 
cities in human societies, as well as some of the general aspects of their internal organi- 
zation, may be universal: they may be expected to develop in urban systems that arose 
and evolved independently and hold across time, culture and level of technology. Cities, 
from this perspective, are variations - or perhaps better, elaborations - on a theme [7j[8]. 

The endeavor to discover general mathematical regularities of urban life, "laws of 
cities" if you like, is relatively new but increasingly possible given the growing availability 
of more and better data, and the multi-disciplinary scientific interest in the subject [lj[6j[7j 
[9-11 . Interest is also guided by the socioeconomic imperative of understanding cities in a 



fast urbanizing world 12 . The idea that cities - in some specific sense - are self-similar is 
what we call the hypothesis of urban scaling. In its strongest form it states that essential 
properties of cities in terms of their infrastructure and socio-economics are functions of 
their population size in a way that is scale invariant and that these scale transformations 
are common to all urban systems and over time. This means that there is no break in scale 
- no minimum or maximum population size - across which a city becomes no-longer a city: 
say a village or a megalopolis. Cities of different sizes are not, however, the same because 



many important scaling transformations are non-linear 7 8 13 . As a consequence, there 
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are predictable and quantitative per capita savings in material infrastructure and increases 
in socio-economic productivity and innovation as a function of population size. Any urban 
system is ultimately rooted in material resources derived from food, energy and other basic 
materials but it is the connection of many (smaller) settlements with larger cities that 
drives the system as a whole to greater resource and economic efficiency and productivity, 
and permits increasing returns to the population scale of large cities in terms of innovation 



and wealth creation [6j|7 14 . These are ultimately the reasons why cities exist and can 
continue to grow. 

The urban scaling hypothesis, and the research that supports it, has non-trivial and 
subtle implications - empirical, methodological and statistical - while also overlapping 



with established research perspectives on cities 14-17. In the present discussion we 
address some of these implications and points of contact seeking to clarify and expand 
arguments presented elsewhere. Specifically, we consider the following questions: 

1. Is the urban scaling hypothesis synonymous with power-law behavior for urban 
observables? 

2. What is the relevant spatial unit of analysis? 

3. Over which range of population sizes do important attributes of cities exhibit scale 
invariance? 

4. What scaling proprieties can be expected to result from the mixture of local and 
national-level effects? 

5. How does the scaling hypothesis help us understand what type of complex system 
a city is? 

6. What are some of the methodological difficulties that arise when studying the prop- 
erties of cities from current available data? 

7. How does the urban scaling hypothesis relate to and complement the existing body 
of work on urbanization and urban dynamics? 

Although the scaling hypothesis is a good general description of many properties of 
cities across time and space, there are, of course, plausible counter-arguments. It is 
possible, for instance, that below a certain population size a town may no longer have 
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the functional properties of a larger city. Archeologists, for example, struggle with this 
distinction: When can we say that a settlement is urban? Does it require crossing a 
sharp threshold of density and population size, as it is often stated by census bureaus 118] 



and well known definitions of urbanism 10 , or is there a more diffuse but non-linear 



continuum. Can cities exist without a developed urban system? Here, we discuss these 
concepts in greater detail, present new empirical evidence over a greater number of urban 
systems and a larger range of city sizes and propose several tests of the urban scaling 
hypothesis. In so doing, we hope to sharpen the contribution made by research on urban 
scaling towards understanding the essential features of urban life. 



Formalizing the hypothesis of urban scaling 

The statement that functional urban quantities should be scale invariant follows from 
the general observation that they are characteristic of population agglomerations of all 
sizes, from the smallest towns up to the largest mega-cities. By functional we mean 
the size of a place's economy, its amount of conflict, the extent of its infrastructure, its 
rate of innovation, etc. We do not mean, however, specific proxies for these quantities 
such as amount of precious metals, murders by gunshot, number of patents, or of R&D 
researchers, etc. which are quantities specific to urban systems at a given time, particular 
technological context and level of socio-economic development. 

The requirement of urban scaling is that any average functional quantity, Y, is scale 
invariant, meaning specifically that 

Y(XN)/Y(N) = /(A), (1) 

where the function /(A) is independent of population size, N, but does depend on the 
arbitrary relative population size, A > 0. This simple assumption has been restated many 



times, for several decades and across several different disciplines 19,20 , but it has not 
always been recognized that it implies the scaling relation 

Y(N) = Y N?, (2) 



as can be verified by direct substitution. The constants in N, Y and /3, determine the scale 
of Y, more precisely Yq = Y(N = 1), and the relative increase in the rate of Y in terms of 
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the rate of N, that is /3 — d t ln(Y)/d t \n(N). In general Yq is time dependent and varies 
from one urban system to another. The exponent (3 is in general time independent (or 
slowly varying, at least) and takes similar values for similar quantities in different urban 
systems [7j[8j[2l], whether geographically or over time. To prove that Eq. (1) implies the 
scaling law in Eq. (2), consider that Y(N) is independent of the arbitrary parameter A, 
which means: 

dY{N) -0^ 1 dY{XN) 1 df{X) Y(XN)-0 (3) 

Note that 

dY(XN) _ dxdNdY(XN) _ NdY(XN) 

dX ~ dX ~dx dN ~ ~X dN ' ^ 
with x = XN. Then, because dN/N = dlniV, we can write 

d\nY(XN) _ d\nf(X) 

dlnN ~ dlnX ' ^ 

which we can be integrated to give 

Tl din /(A) 



Y(XN) = exp 



din N 



o 



din A 



Y e? lnXN = Y (XN) P , (6) 



where (3 = ■ The derivative in Eq. (5) is independent of the specific value of A, 

which implies that /(A) = A^ Because of this property we can choose any scale N = XN' 
above and write the scaling law in its simplest form of Eq. (2), as usual. This derivation 
reveals the mathematical assumptions necessary and sufficient to obtain scaling laws for 
cities. 

It is important to state clearly the meaning of Eq. (1). It says that cities are self- 
similar in terms of scale N, so that regardless of their actual population size their average 
properties can be inferred from knowledge of those at another city, whose population is 
related to it by a scale transformation A. Over the range of population scale where it 
holds (and here it is assumed that it holds from AT = 1 to infinity, but see discussion 
below) this says that any functional property of a large city, organizational or dynamical, 
is already present in a small town and vice- versa, and that its quantitative prediction can 
be made via a non-linear scale transformation. 

So far we only discussed urban quantities, Y, as if they were deterministic. It should 
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also be clear that urban scaling is an average property of cities, so that it does not 
determine urban properties exactly but rather up to some level of uncertainty An ultimate 
theory of cities should provide predictions about urban indicator statistics, including the 
expected value of deviations from the mean scaling prediction and the correlations in 
time and space of these deviations. The beginnings of a statistical formulation of scaling 
is presented in detail in Ref. [22], where it is shown how it emerges from the estimation 
of the conditional probability, P(Y\N), as the expectation value of urban quantities Y, 
given a city's population N. These issues are discussed in detail below. Moreover, a 
theoretical framework that predicts the observed values of scaling exponents across many 
urban quantities from a few general properties of social and infrastructural urban networks 



has recently been proposed in Ref. 21 . This translates the expectation of scale invariance 
for urban indicators in terms of a few well known and well accepted generative principles 
and provides a detailed derivation of Eq. (2) and its parameters (such as = 1 + 5, 
5 ~ 1/6, for socioeconomic rates). This theory also derives under what conditions the 
agglomeration advantages of cities may disappear, f5 — > 1. 

For the strong properties dictated by self-similar scaling to hold in practice for all urban 
systems we must ask that cities are defined in a way that is consistent across scales. The 
approach to these issues from the point of view of the scaling hypothesis sheds some light 
on difficult problems in the study of cities, such as the choice of unit of analysis, the 
extent of urban self-similarity, the nature of proxy quantities and the presence of truly 
local urban effects vs. non-local (national) effects, to which we now turn. 



What is the unit of analysis? 

A major practical difficulty in studying the properties of cities is the choice of unit of 
analysis. Most official statistics pertain to somewhat arbitrarily defined administrative 
units, which, at some level, are not cities at all. Examples include counties or census 
tracts in the USA, several forms of local authority in the UK, prefectures in Japan, 
prefecture and district level cities in China, and municipalities in many European and 
South American nations. 

However, the appropriate definition of cities is functional, as strongly interacting, co- 
located social networks. At the socioeconomic level there has been an increased effort to 
define cities in these terms. So, for example, we have the current definitions of microp- 
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olitan and metropolitan areas in the USA, which are in effect unified labor markets 18 



and similarly defined spatial units in Japan 23 , several Latin American nations [22] , the 



European Union (i.e., "Larger Urban Zones" [24]), and so on. The definition of these 
socioeconomic units requires measures of social interaction, or their proxies, which are 
difficult to obtain and analyze unambiguously. The U.S. Census Bureau utilizes a combi- 
nation of population size, density and commuting flows data when defining metropolitan 



areas, but no simple algorithm is provided 18 . Nevertheless, the smallest integer unit 



that makes up any metropolitan area is a county. Similar ideas have been pursued in 
Europe and Japan but the standardization of these definitions (especially in Europe) is 
still at a developmental stage. Here, we want to emphasize and illustrate quantitatively 
some of the potential biases in urban indicators that can arise when units of analysis that 
are partial to a larger city or aggregates of cities are taken, instead of their appropriate 
definitions. We focus on the estimation of scaling exponents f3. 

Consider a set of socioeconomic units with scale (e.g. population) {x^ and urban 
indicator {yi}, with i G {l,...,n}. Then consider also the log-transformed variables 
Xi = InXi, Yi = ln?/j. Then, the scaling exponent can be computed in practice via, for 
example, the Ordinary-Least-Squares (OLS) estimator, 



_ XY - X Y 

P = — _ 2 • 7 
X 2 -X 

Here, the bars over the symbols denote sample averages, so that, for example 

i n 

XY = -Yx i Y i . (8) 

i=i 

It is trivial to verify that, if we insert the relation Yi = (3Xi, we recover the value of the 
exponent from Eq. (|7]). 

Consider now a situation where the units Xj, y, are correct urban definitions, but 
where we will further disaggregate them in terms of parts of a functional city according 
to the transformation where we take a datum {x, y} and express it in terms of a set 
of points {wi,Zi} with i = l,...,m. Under this operation of disaggregation, the log- 
transformed variables W, Z become putative urban units in their own right (new cities) 
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and the estimator for (3 changes, by an amount 6(3 as 



f r = p + 6p = x L- x J + 6 » (9) 

X 2 - X 2 + s D 

S N = 5 X y - X5 Y - Y5 X - S X 5 Y , S D = 5 X 2 - 2X5 X - (S x ) 2 , (10) 



where 



5 X = 1— [m(W - X) + (X - X)] , S Y = \— [m(Z -Y) + (Y- Y)] , (11 

n — 1 + m L J n — 1 + m L J 

5 XY = r \m(WZ - XY) + (XY - XY)] , 5 X 2 = \m(W* - X 2 ) + (X 2 _ X 2 ) 

n — 1 + m L J n — 1 + mL 

Here the sample averages over the new variables W and Z are taken over their range, 
that is, for example 



W = -Y^W l . (12) 
i=i 

This expression, though exact, is not very transparent. We can see what the change 
in the exponent (3 is, typically, by expanding Eq. (9) to first order in small 5 X and So, to 
obtain 

rn $N — (3Sd n Q x 

5(3 ~ = — =2-. 13 
X 2 -X 

This expression can be further simplified, if, without loss of generality, we choose to have 
started with variables X; and Yi such that X = Y = 0. Then, collecting leading terms, 
we obtain 

S a „ ™^wz-(3W+{(3-(3 XY )X 2 
(n-l + m)X 2 

where 



XY XY WZ-W Z 

-=, PXY — -7^-, PWZ — ~= =2~- 

X 2 X 2 yy2 _ w z 



(15) 



Thus, we see immediately that if all variables scale with the same exponent, (3, the 
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correction vanishes, 5/3 = 0. A small correction may occur if the variables X, Y, to be 
disaggregated, are atypical in terms of scaling, that is if (3xy is very different from (5. 
The most import change, however, results from the leading correction for large m, if the 
variables Wi, Zi, do not scale, or do so with a very different exponent from /3. In general, 
this correction can have either sign. If Wi and Zi are anti-correlated, meaning, for example, 
that parts of the city with greater population have lower incomes than expected from the 
scaling relation and vice versa, this will reduce the expectation for /3. This could result, 
for example, if income was reported by place of work vs. place of residence in cities 
organized in terms of central business districts and residential suburbs. Or by strong 
income inequality in terms of low density rich neighborhoods compared to high density 
poor parts of the city. Thus, strong heterogeneity inside the city as well as specific forms of 
reporting can bias exponent estimates, and naturally result in washing out agglomeration 
effects. Only when the strongly mixing components of the city are aggregated together 
can one see the city for what it is in terms of its socioeconomic performance. 

Likewise, it can be shown that when several cities are aggregated together the exponent 
/3 will tend to become more linear as would be expected if a putative larger interacting 
population fails to realize its full agglomeration effects in terms of socioeconomic output 
or savings in material infrastructure. To show this, consider the aggregation of two cities 
into one, so that x + = X\ + x 2 and y + — y x + y 2 - Then define, as above, X + = \nx + and 
Y + = lny + . We find that, under this aggregation, 

5 XY = - (X + Y + - X X Y X - X 2 Y 2 ) , 5 X 2 = - (X 2 + - X\ - X 2 2 ) , (16) 
n n 

5 X = -(X + -X 1 -X 2 ), 5 Y = -(Y + -Y 1 -Y 2 ), (17) 
n n 

so that, to first order in 1/n, 

5(5 ~ JL [ {(3+ - (3)xl + ((3- (5 12 ){Xl + Xl)] , (18) 



where X^(3 + = X + Y + and (Xf + X|)/3i 2 = X{Yi + X 2 Y 2 . This last expression can be 
simplified further when we assume that the original variables scale exactly, (3\ 2 = (3, so 
that we are left with the first term in Eq. (18), which can be written as 
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It is now easy to see that, for sufficiently large cities, the magnitude of the second term 
is always larger, so that the correction to /3 becomes negative if /3 > 1 and positive if 
/3 < 1, thus shifting the true exponent towards unity, and underestimating agglomeration 
effects. In particular if we write (3 = 1 + 6, with \S\ « 1 |21| , this leads to 

(6p)~-6^Mxi + X2)-l], (20) 

Px 

which takes the opposite sign of the deviation, 5, in the exponent from unity as long as 
the sum of populations x\ + x 2 > e. 

Thus, we have shown that the choice of unit of analysis is crucial in estimating the 
correct urban scaling exponents. In general, if cities are under or over aggregated this can 
lead to the underestimation of the magnitude of urban agglomeration effects. 



The extent of self- similarity 

There is by now ample evidence that important properties of cities increase, on average, 
faster (socioeconomic superlinearity) or slower (material infrastructure, sublinearity) than 
city population size [Tj[8j[TTJ[2TJ[26] . These properties hold across time and different urban 
systems, even those at very different levels of development, though with different baselines, 
Yq. The issue is what is the best quantitative characterization of these size dependences 
and in particular over which range of population do they manifest scale invariance. 

Most presently available datasets of urban indicators have a relatively small range 
of scales, at least when compared to systems in physics or biology. Even in the largest 
urban systems cities are limited to tens of million people (10 7 ) and data are usually 
not available for units below N ~ 10 3 — 10 4 , leaving us with only about 3-4 orders of 
magnitude. Urban indicators span a similar range. Over this range it is often the case 
that our putative scale invariant functions, Eq. ([2]), can be modeled in terms of other scale- 
dependent functions, such as Y ~ iVlog N, or other functions with more parameters |26| . 
The fact that these other functions of N can fit urban data well (but not better) than 
power-laws in most observed cases, see Figs 1-3, has been used as an argument against 



the scale invariance of urban indicators 26 . However, this merely means that other 



models may be viable explanations of the scaling behavior of urban indicators, along with 
scale invariant functions. In science, a model can never be proven correct, of course, but 
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models can be excluded if they make not only empirically wrong predictions but also imply 
counterfactual consequences or if they contradict fundamental theoretical expectations. 
Here we discuss some of these issues. 

It is always the case that any power law, especially with a power close to 1, can be 
approximated in terms of a sum of powers in logarithms, as is obvious from the Taylor 
expansion 



Y(N) = YqN 13 = Y Ne^- 1)lnN = Y N 



l + (/3-l)\nN+h(/3-l) In iV) 2 + O [((/3 - 1) In A^) 3 ] 



(21) 

As long as {fi — l)lnA r << 1 the first few terms in the expansion give an excellent 
approximation to the original function. Because In A" ~ 15 and — 1 ~ 1/6 we should 
expect (/3 — l)lnAT not much larger than unity, see Figure 1A, where the expansion in 



Eq. (21) is shown to produce an excellent match to data. Alternatively, by introducing a 



scale in the problem, N m , the function 

Y(N) = CN hi (22) 

typically improves the empirical fit at the cost of having implicitly assumed that as A" 
becomes small Y vanishes, i.e. A" — > N m , Y(N) — > 0. For cities smaller than N m the 
logarithmic fit predicts negative and fast diverging urban indicators. 

The observation that these functions can often fit the data as well as the power law 



was made recently for GDP and personal income in US metropolitan areas 26 and is 



indeed true empirically in most cases, see Figs 1-2. As we already discussed above, our 



reason for not considering Eq. (22) as a viable fit to urban data in our early analyses [7lfe] 
was due to its strange consequences as it implies that common urban indicators, such 
as personal income, wages, GDP and all others would vanish in towns below a critical 
size (see Figs 1-2) and presumably be zero or need to be specified in some additional 
manner below N m . Also, empirical fits with this form predict that N m would be quantity 
specific, vary over time and from one urban system to another, see Figs. 1-3. Figure 
1A shows the scaling of personal income with population for US cities larger than 10,000 
people (micro- and metropolitan statistical areas) in 2006. This shows that both the 



power law function and the scale-adjusted logarithm, Eq. (22), fit the data equally well 



and that the scale N m ~ 1. Personal income, as we shall show below is a problematic 
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quantity as it mixes truly urban dynamics with nation-wide economic transfers. As such, 
its scaling exponent /3 = 1.06 is unusually low, which allows these two functions to 
coincide over a larger population range. Fig. IB shows patents filed in the same cities. 
While it remains true that the scale-adjusted logarithmic function and the power law fit 
the data equally well, the former now predicts a sharp nose dive in patent production 
for small cities that would vanish at N m ~ 200 people. Similar effects occur for other 
urban indicators in other urban systems: Fig 2A shows the number of murders in Brazilian 
Metropolitan areas and smaller municipalities. We fitted both the power law function and 
the scale-adjusted logarithm to metropolitan areas and investigated how these predictions 
generalize to smaller municipalities. While the power law performs reasonably well and 
its exponent for metropolitan areas is consistent with that for the aggregated data set, 
the scale-adjusted log function generalizes poorly, fails to predict the rate of violence in 
small municipalities, which it would have vanish at the critical scale of N m = 14, 454 
people. Figure 3B shows the scaling of income in Japanese Metropolitan areas: again, 
while both types of functions work well among large cities, the adjusted scale logarithm 
would predict vanishing income for N m ~ 257 people. 

Because we find these statements counter-factual, and because the scale-adjusted log- 
arithmic function does not derive from theory to the best of our knowledge or fit the data 
manifestly better, we believe the hypothesis of urban scaling in terms of a scale invariant 



function, predicted by theory 21 , remains at this point the better explanation. Neverthe- 
less, the explicit observation of small settlements provides a testable prediction that can 
distinguish these models. In this vein, some of us have recently applied the hypothesis of 



urban scaling to archeological evidence from the Prehispanic Basin of Mexico 27 . In this 
study of one of the most complete urban systems of an early major civilization, we find 
human settlements as small as 10 people and about 1 hectare in land area. Urban systems 
across four major cultural periods spanning about two millennia show power-law scaling 



of settled area compatible with modern cities and general theoretical predictions 21 



Although more evidence for the properties of small settlements is desirable, we take this 
as another indication that the hypothesis of urban scaling applies very generally. 

Another general issue deals with the potential dynamical consequences of superlinear 
power-law scaling as a driver of urban growth. This can lead to a finite-time singularity, 
which is sometimes approximately observed [7j[28]. A logarithmic function potentially 
shows a qualitatively different behavior [26] . To test these ideas we present below the 
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analytical solution that follows for the analogue of the urban growth equation proposed 
in [7]. Consider that some quantity Y, which scales faster than linearly is seen as driving 
the growth dynamics of a city, while we assume here for analytical tractability that costs 
scale linearly (but see supplementary materials in |7|| for a complete analysis). Then the 
growth equation is 

rl.N 

Y(N) - BN. (23) 



dN 
~dt 



The solutions for a power-law function, Np(t), was given in [7j; 

1 



B 



N 



1-/3 



B 



-B(l- 



(24) 



The solution for the logarithmic function, N\ n (t), is 



Nn(t) = ^ m exp 



B 

C 



In 



N^_B 
N m C 



jet 



(25) 



both obeying the initial condition N(t = 0) = Nq. Thus the logarithmic function is 
qualitatively different from the power law in that it does not create a pure finite-time 
singularity: Instead this solution grows in time like an exponential of an exponential. 
However, these two solutions are very hard to distinguish unless one is very close to the 
(unphysical) singularity, which is not generally possible. 

The true singularity, where N — > 00, produced by the superlinear driving term, start- 
ing at N(t = 0) = iVo, occurs at a finite time t c (N ) 



1 



09 - 1)Y N ( 



p-i 




(26) 



Short of this singularity, both driving terms predict an acceleration of the population 
growth rate r = in time and as a function of the initial population size, A^ . If we 
compute the characteristic time that it takes to reach a certain maximum growth rate r*, 



from Eq. (24) and (25), we obtain in each case 



1 



B 



(J3-1)B 



In 



r*+B 



1 _ JL AT 



1 r* 



B 



(27) 



The first expression reduces to Eq. (26) in the limit of large rate r* and large No, which 
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shows how t* becomes shorter with larger Nq ?f\. This property remains true even when 
these limits are not taken, though its expression is more complicated. Likewise, expression 
for t* n under log-driven growth, even in the absence of a true finite time singularity, also 
shows the same properties. We write it as 

^h^mhi- (28) 

where y{No) = Y(Nq)/Nq, which shows that as Nq becomes larger the initial condition 
in the growth equation approaches a fixed r* in a finite and decreasing time (Note that 
the initial condition is such that y{N ) > B, for all No). Thus, the general observable 
properties of the growth rate remain essentially the same, regardless of the use of a scale 
invariant driving function, or of an adjusted scale logarithm that mimics it. It would be 
interesting to investigate in the future if there are growth curves for cities from which 
the expression for the driving term can be estimated in a sufficiently robust way that 
hypotheses about different driving terms can be tested. In reality, however, the growth 



of cities in time is more complex than that given by Eq. (|23|), because of a more complex 
cost structure 



21 and because the time variation of the pre-factors, such as Y , cannot 



be ignored. 



Mixed quantities: urban vs. national effects 

A practical issue with estimating scale invariant relations is that available data does not 
always refer to quantities that are truly local to each city. One example is personal income. 



Personal income is a combination of several components 29 , such as wages, profits from 
investments and transfers. Transfers refer to government payments that effectively re- 
distribute wealth from richer (larger) cities to poorer (smaller) ones. These nation-wide 
transfers have several effects on the statistics of this urban indicator. First, the fit is 
unusually good, with very small outliers. Second, its scaling exponent is only weakly 
superlinear, /3 = 1.06 — 1.07. These effects are partly due to the fact that personal income 
as given is not a pure measure of the cities economic performance but partially the result 
of urban system wide redistributions. Figure 3 shows the scaling of net earning^] and of 

1 As defined by the US Bureau of Economic Analysis, "Net earnings is earnings by place of work (the 
sum of wage and salary disbursements, supplements to wages and salaries, and proprietors income) less 
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transfers. We see that net earnings are more superlinear j3 = 1.12 but transfers (received) 
are slightly sub-linear with an exponent /3 = 0.96. 

What scaling properties, if any, can we expect from the mixture of these components? 
In general the answer is that the sum of two power-law functions is not a power law. 
However, in practice, we may expect that the sum may behave approximately as a power 
law, provided that the two exponents are sufficiently similar. To see how this can be, 
consider the function 

Y(N) = AN a + BN b . (29) 

As we have already noted above, the exponent of any function Y(N) can be estimated 
via the quantity 

dlnY _ 1 dY Aae alnN + Bbe blnN 
1 ~ d\nN ~ YdlnN ~ Ae al * N + Be bl * N ' ^ 

Then, we obtain, for small e In N, e = a — b, 

This formula generalizes trivially to more components in the sum. Thus, provided the 
exponents are similar the (approximate) exponent for the sum is well predicted by the 
weighted average of the exponents of each additive scaling function (first term in Eq. (|31~j)), 
with the first correction being controlled by the small quantity e 2 In N. Consequently to 
a very good approximation the resulting exponent is the weighted average of the two 
underlying exponents. For example, if A = B then the resulting exponent will be the 
average /3 = (a + b)/2, to leading order. 

Another related issue is the shape of the distribution of deviations away from scaling. 
These are the residuals of a log-log fit, which in Ref. [8] we defined as Scale Adjusted 
Metropolitan Indicators (SAMIs), 

In Refs. [8,22,30 we briefly addressed the shape of the statistical distribution of £ and 

contributions for government social insurance, plus an adjustment to convert earnings by place of work 
to a place-of-residence basis" 
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showed that it was reasonably well described in terms of a Gaussian (that is a log-normal 
in the original variables), at least at the tails. Here, we simply want to emphasize that a 
mixture of variables hence distributed will not have a similar distribution, and may show 
skewness. 

Consider then a mixture of Y\ and Y% quantities, each of them obeying on average 
a scaling law with exponents fa and fa. We will assume that these two variables are 
log-normally distributed around the scaling relation, for simplicity. While the sum of the 
log variables, which have Gaussian distributions, is also Gaussian 

\og{Y 1 )+\og{Y 2 )^H{^a) (33) 

it is not true that the log of the sum of log-normals is Gaussian. Several approximations 
have been derived in the literature to extract the properties of the distribution of sums 



of correlated or independent lognormal variables 31 33 . Most of these match other 
distributions to the measured parameters of the sum distribution 32 , 33 . It has been 
shown that the tails of the sum distribution are often approximately log-normal 32 33 , 
but the body of the distribution deviates from a simple log-normal shape. In particular it 
often shows skewness, so that a skewed log-normal distribution provides a good general fit 



to the sum 33 . These properties of residuals' statistics, which may occasionally appear, 
may be taken as signs that the underlying quantity may be an additive mixture of more 
fundamental urban indicators and that it may combine urban effects with nation-wide 
dynamics, such as transfers, in the case of personal income. 



Cities are non-extensive complex systems 

Here, we discuss a conceptual and methodological issue underlying the study of cities 
across different disciplines. Specifically, should we think of cities as extensive systems, 
in the sense used in statistical physics, with constant city size-independent densities (per 
capita quantities) including population density, wages, crime? Or should we, instead, 
approach the study of cities from their global properties across the city (total wages, 
total GDP, total crime) and the implicit expectation that densities within the city are 
non-intensive, highly variable and must therefore be approached statistically? 

To be fair the general answer to this question is well known: all complex systems are 
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non-extensive and their (obvious) densities are non-intensive; that is they depend non- 
linearly on measures of total system size. This means, for example, that when we divide 
a complex system in parts we cannot generally expect the average properties of each part 
to be similar to each other. We also find that the properties of the parts, including its 
basic elements (such as people and locations in a city), depend on the size of the entire 
system. Think of an organism or a brain or an ecosystem, where different parts perform 
different functions. Cities too, show extreme spatial and individual heterogeneity: there 
are rich and poor neighborhoods, there are business districts, which are almost exclusively 
dedicated to jobs and trade and there are suburbs, which are mostly residential. There 
are parts of the city dedicated to certain types of business, manufacturing, etc. Likewise, 
there are poor and rich individuals, people with radically different economic and social 
expertise, motivations, ethnicity, etc. The city only makes sense when all these parts 
are put together as an interacting social system [6 21 : There is no such thing as a 
representative average place or average person inside the city. There is, however, a general 
sense of the global socioeconomic and spatiotemporal fabric of cities. 

The question of unit of analysis is fundamental because it deals with targets of expla- 
nation and theory development. For example, in urban economics one attempts to explain 
(average) per capita quantities, such as wages per worker or crime rates. However, it is 
uncommon in economics to develop statistical microscopic theories. As a counterpoint to 
this approach, when studying complex systems we often start with the bulk properties of a 
system because of the expectation that the whole is 'more than the sum of its parts', that 
is that the properties of individuals are a reflection of system-wide dynamics [34]. These 
global properties are what is directly measurable and are usually simpler than individual 
attributes from which the statistics of densities - not just their means - are derived. In 



our previous and current work we have advocated the latter approach [7j,[8 , 22 . Here we 
provide some of our rationale in greater detail. 

The difference between extensive and non-extensive systems and of the type of theory 
that is appropriate to describe them is best illustrated via a few examples. In simple 
physical systems intensive quantities are well defined - meaning that they are truly system 
size independent - when a system's total properties are proportional to the system's 
size. In this case densities are ratios of extensive quantities, which are linear function of 
system size, as their trivial size dependence cancels out leaving us with constants that 
characterize the system. The simplest familiar example is a(n ideal) gas in a container 
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whose bulk properties are proportional to the volume and to the number of particles 
therein. The average energy per particle (i.e. the temperature) and the force per unit 
area (pressure) are then simple functions of each other and of the volume and number of 
particles. Extensive systems have no shape: They are well characterized by homogeneous 
densities and stretch to fill the available volume, provided exogenously. Thus, for these 
simpler systems, extensive or intensive variables are trivially interchangeable (because 
— 1) and the latter provide well defined average properties of its microscopic elements. 
The statistics of individual elements is provided in turn from distributions that maximize 
entropy, subject to constraints arising from the average value of intensive quantities (like 
particle density and kinetic energy), e.g. the Maxwell Boltzmann distribution of particle 
velocities in a gas at temperature T. 

Now, contrast these properties with those of cities: Cities are localized in space and 
have definite shapes. Though we measure directly total quantities, such as total GDP, 
total crime or population, we know that their properties - imagined as continuum den- 
sities over space - vary dramatically from one location within the city to another and 
among different people, as is emphasized e.g. by crime hotspots 35 or other population 
density maps at night or during the day. From this perspective recent work in urban 
economics and urban scaling has established the non-extensivity of cities as infrastruc- 
ture and socioeconomic properties do not vary proportionally (linearly, (5 = 1) to surface 
area or population. Furthermore, unlike gas molecules individual members of an urban 
community display tremendous variability with respect to creativity, entrepreneurship, 



productivity, propensity for violent behavior, etc 30 



How then should we start to characterize a city? Non-extensive systems in physics are 
the result of long-range forces, such as electromagnetism or gravity: canonical (equilib- 
rium) thermodynamics generally breaks down as a global theory in these cases, or must 
at least be handled under specific restrictions 36 . Not incidentally, attractive forces are 
often invoked to explain several phenomena relating to cities. For example, gravity mod- 
els have been quite successful at accounting for population movements, such as migration 
flows [37]. Moreover, agglomeration effects invoked in urban economics and economic 



geography 14 are a form of attractive interaction between people or firms, even if the 
quantitative form of this force is not explicitly known. Likewise, the movements of people 
inside cities are dictated by social interactions that afford them an income, and other op- 
portunities. Thus, we should think of cities as social systems that result from attractive 
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forces. 

What can be said about cities from the perspective of the statistical properties of 
systems assembled by attractive interactions? One familiar result of gravitational forces 
is the creation of stars, which are particle reactors held together by the balance of ra- 
diation pressure leaving the star and gravity compressing it. Cities in urban economics 
and regional science are also thought to be the result of the balance between centrifugal 
(congestion, high land rents, crime) and centripetal (beneficial social interactions, agglom- 
eration effects) forces 14 . The resulting structures, in stars and in cities, are not globally 
rigorously stable but can persist for long times in a state of dynamical local equilibrium. 
This means, in general, that each location or person within the system is the result of the 
dynamical balance of attractive and repulsive forces, whose magnitude is a function of 
their position relative to all other elements in the system. This often creates mono-centric 
geometries, but not necessarily. In this state, locally attractive and repulsive forces equili- 
brate each other leading to a spatial profile of varying density, pressure, and temperature. 
There is no sense of the typical density or temperature in a star as it varies dramatically 
from the center (at nuclear densities) to the periphery (vacuum of outer space). Stars, like 
cities, must then be understood in terms of global quantities and the conditions and con- 
straints that they impose within their structure. By promoting faster interaction rates, 
however, both larger stars and cities create total outputs that scale superlinearly with 
their size: for example the total power emitted by starts (luminosity) is also a superlinear 
function of their total mass, albeit with different exponents from cities due to the nature 
of radiation and its transport 



21 



Statistical theories of even simple systems governed by attractive forces remain, to 



some extent, a general open problem 38 . However, statistical mechanics of these sys- 
tems can be developed based on intensive quantities that are either place and time de- 



pendent 39 and/or judiciously constructed by removing average size effects. When this 
is done, we may expect that the resulting statistics become simpler and may belong to 
the exponential family of distributions. 



Statistical theories of cities 



A statistical theory of cities is necessary to eventually account for individual and collective 
variability in and across urban areas. Statistical theories, in the sense of theoretical 
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physics, are uncommon in geography, and especially among the social sciences, including 
economics (with the notable exception of finance). Here, we discuss some initial attempts 
at characterizing the statistical properties of urban quantities and their connection to 
scaling relations. 

As we pointed out in the preceding section, the main challenge to formulating a sta- 
tistical theory of urban quantities is dealing with the size dependence of most densities, 
expressed usually as per capita quantities. In Ref. [8], we showed that certain simple 
scale invariant quantities can be constructed using scaling. This procedure was further 
explored in Ref. [22J where we gave a more precise statistical meaning to urban scaling 
relations and established their connection to Zipf 's law for the size distribution of cities. 
Here we place these findings in a wider context and discuss some of what remains to be 
done on the path towards a statistical theory of cities. 

First, the statistics of urban indicators can be approached directly, in terms of the 
statistics of Y, or indirectly via associated quantities that are scale independent. It is 
observed empirically that the distribution of Y, for example for murders |22|, is very 
broad, and is generally well described by a power law (Pareto) distribution. On the other 
hand, the conditional distribution of P(Y\N) is more localized. Formally, this means that 
population size has information on the values that Y is likely to take. The estimation of 
these distributions is often made difficult by the type of proxy data available. For example, 
small enough cities may show, on a given year, zero, one, two murders, patents, etc. Does 
this mean that scaling breaks down in these cases? The answer is no, not necessarily at 
least, but that the expectations for the number of Y must be treated statistically, that is, 
there may be a small probability that a small city will have one event per year, but this 
probability, not the detailed outcome, remains non-zero and a scaling function of N. 

We showed recently that this implies that the correct statistical interpretation of 
scaling laws is as expected values of Y, given N. Moreover, this conditional statistics is 



naturally in the lognormal family 22 . To see this consider that 



P(Y) = Y,P(Y\N)P{N), (34) 



N 



where P(N) is the probability of a city of population size N, related to Zipf's rank-size 
distribution. Indeed, with P{N) given approximately by a power law, P(Y) will also be 



Pareto distributed if P(Y\N) is lognormal 22 
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In practice for quantities that are granular and can be zero at each realization, we 
estimate instead P(N\Y), and use Bayes' relation to compute P(Y\N), 

nVW) - ™^ (35) 

which is distributed also as a lognormal, if P{N) and P(Y) are Pareto and P(N\Y) is 
lognormal, as it is for murders in several Latin American urban systems [22]. The type 
of distribution observed for other variables remains an open problem, but we should note 
already that the two conditional distributions may take more complicated forms if the 
city size distribution deviates from Pareto, or vice-versa. 

An alternative approach, inspired by the general theoretical considerations discussed 
in the previous section, is to create ah initio quantities that are invariants of city size and, 
as such, provide suitable densities for which we may expect well behaved statistics to 



emerge. Some of these invariants are suggested by theory 21 : for example the product of 
any social output per capita times the volume of infrastructure per capita is, on average, 
city size independent. But more generally we can build intensive quantities by explicitly 



removing their size dependence [8], such as for the SAMIs £, Eq. (32). 

The correspondence between the statistics of Y and those of £ is just what one may 
expect in terms of transforming the original lognormal statistics into more familiar sta- 
tistical mechanics. Besides subtracting the conditional mean of Y given N, which is the 
scaling law, the logarithmic transformation results in Gaussian statistics for £. Thus, 
this approach maps the somewhat exotic statistics of non-extensive urban indicators, Y, 
to canonical statistics of well defined densities, £. The £ appear to always have bounded 
variance, so that Gaussian statistics may be expected, at least in the absence of additional 
constraints. Exploring these statistics more extensively remains an important problem 
for future research. 

From the perspective of the approach sketched here one can obtain a general statistical 
characterization of many urban indicators. For example, we have recently shown that a 
Cobb-Douglas production function is the expected value for the economic output of a 



city, conditional on its population size, and determined its statistical behavior 81 . The 
eventual goal of a theory of cities should be to determine the form and value of urban 
indicator statistics, and specifically of the variances of size-independent densities, which 
are quantity specific, as can be seen in Figs. 1 and 2. Such a theory should also explain 
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the spatial and temporal correlations between densities and why these statistics, when 
expressed in their original variables, appear to be the result of multiplicative random 
processes (40). 



Population size, agglomeration and urban hierarchies 

Although methodologically novel, the hypothesis of urban scaling has many points of con- 
tact with existing literature on cities in the fields of archaeology, anthropology, sociology 
and economics, as well as with the extensive historiographical treatments of cities and 
urbanization. This is not the place to attempt a comprehensive literature survey, but a 
few remarks on the empirical and conceptual overlaps between the urban scaling hypoth- 
esis and other research traditions serves to strengthen the empirical underpinnings of the 
hypothesis, locate it within the myriad of efforts to understand cities and point the way 
to research extensions. 

The role of population size in spurring organizational and technological change has 
been an important formative idea in economics since at least Alfred Marshall's account of 



industrial districts 28 ,41 -48 . Population size (in cities) has been proposed many times as 



the main general transformative phenomenon in ancient and modern human societies, see, 



for example, 13 49 , 51 60 . In all these cases it is not just population size that matters, 
but the resulting social dynamics of interaction in a larger pool of people that requires 
and promotes new organizational and technological solutions. Memorably, Jane Jacobs in 
The Economy of Cities ||6] defined a city as a population agglomeration that through its 
organization and innovation is able to generate (endogenously) its own economic growth. 

This foundational work led to many efforts to quantify the economic properties of 
cities as a function of their population size. The most important conceptual framework 
of modern urban economics is based on the idea of agglomeration economies, which are a 
form of externality that results in a set of benefits that firms and individuals obtain when 
locating near each otheip||20.62j 64| - [66| . A dense interacting co-location of people and firms 
is, of course, exactly what a city is [2l]. A large empirical literature in urban economics 
has attempted to measure the economic effects of urban agglomeration, when controlling 
for other factors, such as the level of education in the population, age, sector composition, 



1 As Lucas ( [61], p. 39) asks: "What can people be paying Manhattan or downtown Chicago rents for, 
if not for being near other people?" 
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etc. Thus, this empirical approach measures net agglomeration effects, which are almost 
always characterized in terms of power-law relations vs. population. Over the last forty 
years, exponents characterizing these net scaling relations have been obtained for several 
measures of economic output, in different nations and controlling for different factors. 
Scaling exponents have been estimated in the range 3-8% [63] . These results imply 
that the measurements of the dependence of gross socioeconomic output on population 
size, that is of the dependence of Y on N in the absence of control variables, must 
be larger, as observed by scaling analysis. One issue with measuring net agglomeration 
elasticities (exponents), using many of the common control variables, is that these controls 
are naturally themselves population size dependent, following similar power-law scaling 
relations [7] . It is also difficult to consistently include the same controls in different nations 
and over time and factors like human capital remain difficult to measure objectively, which 
has been cause for some controversy as to the magnitude of its importance as an input 
factor. 

Besides the measurement of net or gross agglomeration effects, urbanists, regional 
scientists and economic geographers have attempted to characterize the general mecha- 
nisms that lead to productivity increases through social interactions. In somewhat dif- 
ferent ways, authors as diverse as Marshall 41 , Mumford [I], Jacobs [6 67 , Pred 68 



Braudel 69 , Bairoch |3j, Hall [2], Florida 70 , Glaeser 71 and Kennedy 72 all emphasize 
in their writings the advantages conferred by larger urban population size. Specifically, a 
larger population allows for a more extensive division of labor and specialization, engen- 
ders greater diversity, fosters the generation, exchange and recombination of ideas, makes 
it possible for scale economies to manifest themselves and facilitate the building of physi- 
cal and social infrastructure. Duranton and Puga 64 , 73 , have attempted to systematize 
these effects in terms of three categories of mechanisms, namely sharing, matching and 
learning. Sharing refers primarily to the joint use of indivisible assets, such as a hospital 
or a stadium. Matching refers to a more productive allocation of workers to jobs, for 
example, while learning deals with processes of information transfer between individuals 
and firms. These categories of effects provide a useful organizational framework bridg- 
ing several economic processes but they may not capture all socioeconomic phenomena 
leading to increases in productivity with size. Another example from sociology refers to 
Claude Fisher's subculture theory [74] (see also Wirth's influential early essay [l0]), which 
argues that larger populations promote new (unconventional) behaviors, from deviance 
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to innovation, reflected in a greater number of new subcultures, created by small groups 
of people with particular interests in common. Note that this effect is not necessarily 
economic, and deals in general with the creation of new information and culture through 
recognizable new social collectives. This rich literature provides many examples of the 
socioeconomic effects that can be stimulated by city size, but, contrary to scaling [7 21 



it does not formalize these detailed phenomena quantitatively, in terms of the structure 
and dynamics of social networks that are predictive of observed agglomeration effect ex- 
ponents. 

Population size figures prominently in another long-standing research tradition on 
cities, namely urban hierarchies. Cities do not exist in isolation as they are linked to each 
other through the exchange of goods, people, capital and information. The starting point 



for urban hierarchy models - and central postulate of central place theory 15 - is the 



observation that larger cities have a more diverse set of industries and occupations than 
smaller places. Hierarchy means specifically that the set of activities and occupations 
in larger places include those found at smaller settlements, but not the reverse. Many 
models of urban hierarchies are based on 'central place theory', under which cities func- 
tion as 'central places' providing greater value-added goods to the surrounding areas \TE\ . 
Variations include Losch's analysis of hierarchical centers based upon market areas for 
industrial firms 16 and the Tinbergen-Bos model, which distributes industries among 



cities of different sizes according to their relative economies of scale 75 76 . Evidence 



for the existence of functional hierarchical patterns in urban systems has been well doc- 
umented 77 , 78 and they have been shown to hold not only for urban systems within 



nations but also across national boundaries 79 . The link between population size, ur- 
ban industry and occupational structure in terms of a hierarchy of greater value added 
services implies that wage or productivity decompositions, in which the average wage or 
GDP per worker of an urban area is 'explained' as a result of its industry composition 



(see, for example, 26 80]) is a redundant exercise. Only in the absence of a size hierarchy 
of functions and productivity, that is, in a putative but non-existent urban system where 
each city would fully specialize in different industries, would such a decomposition be po- 
tentially explanatory. The ideas of central place theory have continued to be elaborated 
and, in conjunction with concepts of urban agglomeration, introduced through assumed 
forms for urban production functions with increasing returns [14), constitute the core of 
modern economic geography. From this perspective, urban hierarchies can indeed be de- 
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rived from a more microscopic theory of social interactions in cities that is the goal of 



urban scaling 2 1 , 78 , 8 1 



The main mystery that remains regarding the structure of urban systems deals, in our 
opinion, with their dynamics over time. Present theories of regional economics (in analogy 
with central place theory) provide a means of deriving spatial equilibria (distributions of 
people and economic activity) across cities in an urban system, but they do not tell us how 
they evolve in time. Likewise, urban scaling research has been primarily concerned with 
the value of urban exponents, which are approximately constant over time, but not with 
the time dependence of the pre-factors, Y , that describe how the urban system evolves 
as a whole. The constancy of exponents implies that larger cities do not, on average, 
grow faster than cities in smaller size groups and the average growth rates of cities of all 
population sizes tend to be the same [79j|82j[83] . This independence of growth rates, not 
only of population but also for other socioeconomic properties, is the proximate cause 



of Zipfian size distribution for city sizes across the urban system 82 . However, how 
exactly this equilibration happens, such that cities large and small hold the same appeal 
and grow economically at the same pace despite their productivity differentials remains 
an important test to a science of cities as well as to a general theory of socioeconomic 



development 84 



Discussion 

In this manuscript we strived to clarify the hypothesis of urban scaling, discuss some of 
its potential limitations and methodological issues and highlight how scaling phenomena 
reflect the properties of cities as general complex systems. We have also highlighted what 
are, in our opinion, the main outstanding problems on the way to a science of cities as 
a unified field, recognizable across disciplines, from physics to the full spectrum of the 
social sciences. 

First, in its strongest form, the hypothesis of urban scaling states that functional 
properties of cities, such as their level of conflict, economic productivity and material in- 
frastructure should all vary in a scale invariant way from the largest cities to the smallest 
towns within an urban system. The expectation that this picture emerges as empirically 



true is grounded in recent theory 21 , but should be further tested by measurements on 



all scales, as small settlements remain typically unaccounted for in most datasets. Never- 
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theless, we take it as factual, but yet to be quantified - that even the smallest settlements, 



including those studied by anthropologists [59], archeologists 27 and sociologists, have 
elements that functionally find correspondences in larger modern cities. These generic 
arguments then establish the only possible scale invariant function - a power law - as the 
preferred description of cities across scales. Any other function, even if it fits available 
data equally well, introduces a population scale thresholds below which this property 
vanishes. Such scales, when introduced in fitting functions, are observed to vary empiri- 
cally from quantity to quantity, across urban systems and over time, leading to what, in 
our opinion, are potentially counterfactual results. This case, however, must ultimately 
be investigated and settled empirically and is in our opinion an interesting direction for 
future work. 

Second, several methodological issues remain unsettled in the empirical study of cities 
related to spatial aggregation, statistical interpretation of the data and limitations of 
proxy measurements. At present, there are no full-proof algorithms for defining the socio- 
spatial limits of a city or even a fully generalizable characterization of urban life, though 
the concept of functional cities as interacting populations has firmly emerged as the correct 
idea. Nevertheless, this remains hard to delimit unambiguously in practice, as efforts at 
the pan-european level have demonstrated over the last few years, for example. The 
choice of an arbitrary administrative unit (counties, municipalities, etc) often distorts 
the estimation of agglomeration effects and modifies the conclusions of scaling analysis. 
We have also shown how certain urban metrics (e.g. personal income) may manifest 
deviations from typical expected scaling behavior as the result of mixing truly urban 
(local) dynamics with counteracting national effects. These issues can in principle be 
addressed and corrected for but they require understanding of the nature of proxy variables 
and their limitations, 

Finally, we have shown that cities are not extensive systems and, as a consequence, that 
densities (per capita or per area urban indicators) are necessarily system size dependent 
and heterogeneous, typically varying widely within a city. Additionally, there is signifi- 
cant heterogeneity among the constituent units of cities, namely individuals, households 
and businesses. These properties are the necessary consequence of the generic dynamics 
of agglomeration that characterizes the formation of cities. The fact that densities are 
not direct observables but constructs of extensive quantities should make us prefer city 
wide total measurements as the primary characterization of cities. To us, and in analogy 
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to successful quantitative approaches in other disciplines, this situation suggests that in- 
tensive quantities should be approached statistically, as the result of city-wide constraints 
(total population, area, transportation technology, etc) and not the reverse. Of course, in 
many statistical analyses of urban metrics the average information contained in per capita 
measurements constructed over the entire city is equivalent to that in total quantities. 
We expect that these issues will be settled as statistical theories of cities develop further. 

For their central role in contemporary human societies we expect that cities will con- 
tinue to be the increasing focus of intense and multidisciplinary analyses. We echo the 
observation of authors such as Paul Romer or Richard Florida that cities are the prin- 
cipal socio-economic organizational units of the 21st century, much like firms were in 
the 20th century. Whatever form a general theory of cities will ultimately take, its con- 
tours in terms of the advantages of urbanization and its potential limits are beginning 
to take shape. Human societies generally develop organizational structures, economies, 
innovations and ways to manage conflict that develop across scales in ways that are ap- 
proximately self-similar as the result of general social network interaction dynamics. We 
believe that the hypothesis of urban scaling expresses these self-consistent changes in a 
synthetic and testable way that may help reveal the general mechanisms underlying large 
scale human sociality. 
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Figure 1: Scaling of personal income and patents in US micro and metropolitan areas, (a) 
Total personal income in 2006. Metropolitan areas are shown in orange, with micropolitan 
areas shown in green. Lines show the best fit to a scaling relation Y = YqN^ (blue), with 
(3 = 1.07 (95% CI [1.06,1.09], R 2 = 0.66), the series expansion to the power law in terms of 
logs to first order (black), and a fit to the scale-adjusted logarithm CN log N/N m (red). 
The scale- adjusted logarithm and the power law fits are indistinguishable in terms of 
quality of fits where data are available, (b) Total number of patents filed in US metro 
and micropolitan areas in 2006. Black dots indicate logarithmically binned data for 
visual guidance. The power law fit has exponent (3 = 1.13 (95% CI [1.01,1.26], R 2 = 
0.66). Again, the adjusted-scale logarithm and power law fits are indistinguishable where 
data are available but the former predicts that a city with less than 200 people would 
have zero patents and that smaller cities would display a negative number, a seemingly 
counterfactual result of the introduction of a scale N m = 200 in the problem. 
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Figure 2: Scaling of murders in Brazilian Cities and Income in Japanese Metropolitan 
Areas (a) Total number of murders per year in Brazilian Metropolitan Areas (green) and 
non-metropolitan Municipalities (orange), binned logarithmically (yellow). Lines show the 
best fit to a scaling relation for metropolitan areas plus municipalities, Y = YqN" (blue), 
with = 1.20 (95% CI [1.15,1.25], R 2 = 0.96), a power law fit to metropolitan areas 
only (black) with = 1.30 (95% CI [1.12,1.49], R 2 = 0.85), and a fit to CN log N/N m , or 
scale adjusted-scale logarithm (red), which dives to zero at the critical city size of 14,454, 
which is manifestly wrong given the data on municipalities, (b) Total income in Japanese 
metropolitan Areas (MAs) in 2006 (orange) and prefectures (green). The power law fit 
(blue line) has exponent f3 = 1.10 (95% CI [1.04,1.16], R 2 = 0.99). The adjusted-scale 
logarithm and power law fits are indistinguishable where data are available but the former 
goes to zero at a critical city size of N m ~ 257 people. 
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Figure 3: Scaling of components of personal income in US cities(a) Total net earnings and 
transfers for micro and metropolitan areas in the US in 2007. Note how these two quan- 
tities, the former generated locally in the city and the latter the result of national policy, 
scale differently. When added together they result in approximate scaling for personal 
income with a lower exponent than generally expected from urban scaling theory |21~] . 
Lines show the best fit to scaling relations with (3 = 1.10 (95% CI [1.05,1.15], R 2 = 0.98) 
for net earnings and f3 = 0.96 (95% CI [0.91,1.01], R 2 = 0.97) for transfers, (b) Net Earn- 
ings in US micro and metropolitan areas. The power law fit to the entire range (micros 
+ metro areas) is very close to that for metros only (blue). The best fit exponent for 
metropolitan areas is fi = 1.12 (95% CI [1.10,1.14], R 2 = 0.98). On the other hand the 
scale adjusted logarithm predicts a city of 83 people with zero net earnings, which seems 
counterf actual. 



